记一次 MySQL 关键字多字段模糊查询优化

2018-05-07

MySQL 5.7+

最近线上某个业务的单表关键字模糊查询一直比较缓慢，该业务需要通过关键字对多个字段同时进行模糊查询，故根据 MyBatis 动态 SQL 按如下实现方式实现：

<if test="keywords != null">
AND 
(  
materials_catalog.materials_name 
LIKE CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}), '%')
OR materials_catalog.materials_name_helper 
LIKE UPPER(CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}), '%'))
OR materials_catalog.materials_sku 
LIKE CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}), '%')
OR materials_catalog.materials_sku_hepler 
LIKE UPPER(CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}), '%'))
OR materials_catalog.materials_common_name 
LIKE CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}), '%')
OR materials_catalog.invite_no 
LIKE CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}), '%')
OR materials_catalog.factory_name 
LIKE CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}),'%')
OR materials_catalog.factory_name_helper 
LIKE UPPER(CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}), '%'))
OR materials_catalog.certificate_no 
LIKE CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}), '%')
)
</if>

通过分析索引和数据，发现这些列都已经加了索引，可能导致的原因就是某些列存在值为 null 的情形。故将原因初步定位于 null 值列对索引的影响。

在维基百科中搜索到一篇资料：NULL(SQL))
里面有这么一段话：

Effect on index operation
Some SQL products do not index keys containing NULLs. For instance, PostgreSQL versions prior to 8.3 did not, with the documentation for a B-tree index stating that[21]

B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators: < ≤ = ≥ >

Constructs equivalent to combinations of these operators, such as BETWEEN and IN, can also be implemented with a B-tree index search. (But note that IS NULL is not equivalent to = and is not indexable.)

In cases where the index enforces uniqueness, NULLs are excluded from the index and uniqueness is not enforced between NULLs. Again, quoting from the PostgreSQL documentation:[22]

When an index is declared unique, multiple table rows with equal indexed values will not be allowed. Nulls are not considered equal. A multicolumn unique index will only reject cases where all of the indexed columns are equal in two rows.

This is consistent with the SQL:2003-defined behavior of scalar Null comparisons.

Another method of indexing Nulls involves handling them as not distinct in accordance with the SQL:2003-defined behavior. For example, Microsoft SQL Server documentation states the following:[23]

For indexing purposes, NULLs compare as equal. Therefore, a unique index, or UNIQUE constraint, cannot be created if the keys are NULL in more than one row. Select columns that are defined as NOT NULL when columns for a unique index or unique constraint are chosen.

Both of these indexing strategies are consistent with the SQL:2003-defined behavior of Nulls. Because indexing methodologies are not explicitly defined by the SQL:2003 standard, indexing strategies for Nulls are left entirely to the vendors to design and implement.

主要意思就是索引一般使用 B-Tree 实现，而 B-Tree 是为了处理等于、范围查找、排序等操作，对 null 没法使用 = 操作

在高性能 MySQL 第二版 page 64 中也提到：

要尽量避免 NULL
要尽可能地把字段定义为 NOT NULL。即使应用程序无须保存 NULL（没有值），也有许多表包含了可空列（Nullable Column）,这仅仅是因为它为默认选项。除非真的要保存 NULL，否则就把列定义为 NOT NULL。

MySQL难以优化引用了可空列的查询，它会使索引、索引统计和值更加复杂。可空列需要更多的储存空间，还需要在MySQL内部进行特殊处理。当可空列被索引的时候，每条记录都需要一个额外的字节，还可能导致 MyISAM 中固定大小的索引(例如一个整数列上的索引)变成可变大小的索引。

即使要在表中储存「没有值」的字段，还是有可能不使用 NULL 的。考虑使用 0、特殊值或空字符串来代替它。

把 NULL 列改为 NOT NULL 带来的性能提升很小，所以除非确定它引入了问题，否则就不要把它当作优先的优化措施。然后，如果计划对列进行索引，就要尽量避免把它设置为可空。

现采用如下优化方式：

使用 CONCAT_WS 函数将所有的字段连接以后在模糊查询

<if test="keywords != null">
	AND 
	CONCAT_WS 
	(  
	   '',
	   materials_catalog.materials_name, 
	   materials_catalog.materials_name_helper,
	   materials_catalog.materials_sku,
	   materials_catalog.materials_sku_hepler,
	   	materials_catalog.materials_common_name,
	   materials_catalog.invite_no,
	   materials_catalog.factory_name,
	   materials_catalog.factory_name_helper,
	   materials_catalog.certificate_no
	) 
	LIKE CONCAT(CONCAT('%', #{keywords, jdbcType=VARCHAR}), '%')
</if>

但是如此就不能使用索引了，此时可以利用 MySQL 5.7+ 虚拟列的功能对 CONCAT_WS 的字段创建一个虚拟列：

ALTER TABLE purchase_supply_catalog_using_info 
ADD virtual_keywords VARCHAR ( 500 ) generated always AS (
CONCAT_WS(
'',
materials_catalog.materials_name,
materials_catalog.materials_name_helper,
materials_catalog.materials_sku,
materials_catalog.materials_sku_hepler,
materials_catalog.materials_commen_name,
materials_catalog.invite_no,
materials_catalog.factory_name,
materials_catalog.factory_name_helper,
materials_catalog.certificate_no 
) 
) VIRTUAL;

并对虚拟列创建索引：

1	ALTER TABLE purchase_supply_catalog_using_info ADD INDEX idx_virtual_keywords ( virtual_keywords );

至此，通过实践积累，了解了 SQL 中 NULL 的特殊性， CONCAT 和 CONCAT_WS 函数的差异，虚拟列和虚拟列索引等相关知识。